Using Prefetching to Hide Lock Acquisition Latency in Distributed Virtual Shared Memory Systems
نویسندگان
چکیده
Synchronization overhead may limit the number of applications that can take advantage of a shared-memory abstraction on top of emerging network of workstation organizations. While the programmer could spend additional efforts into getting rid of such overhead by restructuring the computation, this paper focuses on a simpler approach where the overhead of lock operations is hidden through lock prefetch annotations. Our approach aims at hiding the lock acquisition latency by prefetching the lock ahead of time. This paper presents a compiler approach which turned out to automatically insert lock prefetching annotations successfully in five out of eight applications. In the other, we show that hand insertion could be done fairly easily without any prior knowledge about the applications. We also study the performance improvements of this approach in detail by considering network of workstation organizations built from uniprocessor as well as symmetric multiprocessor nodes for emerging interconnect technologies such as ATM. It is shown that the significant latencies have a dramatic effect on the lock acquisition overhead and this overhead can be drastically reduced by lock prefetching. Overall, lock prefetching is a simple and effective approach to allow more fine-grained applications to run well on emerging network of workstation platforms.
منابع مشابه
Lock Prefetching in Distributed Virtual Shared Memory Systems — Initial Results
Synchronization overhead may limit the number of applications that can take advantage of a shared-memory abstraction on top of emerging network of workstation organizations. While the programmer could spend additional efforts into getting rid of such overhead by restructuring the computation, this paper focuses on a simpler approach where the overhead of lock operations is hidden through lock p...
متن کاملMaintaining Cache Coherence through Compiler-Directed Data Prefetching
In this paper, we propose a compiler-directed cache coherence scheme which makes use of data prefetching to enforce cache coherence in large-scale distributed shared-memory (DSM) systems. The Cache Coherence with Data Prefetching (CCDP) scheme uses compiler analyses to identify potentially-stale and non-stale data references in a parallel program and enforces cache coherence by prefetching the ...
متن کاملTransactional Distributed Shared Memory
We present a new transaction-based approach to distributed shared memory, an object caching framework, language extensions to support our approach, path-expression-based prefetches, and an analysis to generate path expression prefetches. To our knowledge, this is the first prefetching approach that can prefetch objects whose addresses have not been computed or predicted. Our approach makes aggr...
متن کاملEecient Integration of Compiler-directed Cache Coherence and Data Prefetching Compiler-directed Cache Coherence and Data Prefetching
Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed shared-memory (DSM) multiprocessors. We propose an integrated approach to solve these problems through a compiler-directed cache coherence scheme called the Cache Coherence with Data Prefetching (CCDP) scheme. The CCDP scheme enforces cache coh...
متن کاملThe Intelligent Cache Controller of a Massively Parallel Processor JUMP-1
This paper describes the intelligent cache controller of JUMP-1, a distributed shared memory type MPP. JUMP-1 adopts an off-the-shelf superscalar as the element processor to meet the requirement of peak performance, but such a processor lacks the ability to hide inter-processor communication latency, which may easily become too long on MPPs. Therefore JUMP-1 provides an intelligent memory syste...
متن کامل